testingiauh912fandomcom-20200214-history
Reliability-sareh saatian
Reliability Definition: Reliability is one of the most important elements of test quality. It has to do with the consistency, or reproducibility, of an examinee's performance on the test. A test with poor reliability, on the other hand, might result in very different scores for the examinee across the two test administrations. Reliability refers to the consistency of a measure. A test is considered reliable if we get the same result repeatedly. For example, if a test is designed to measure a trait .Reliability is another much-talked-about measurement concept. It’s a major concern to the developers of large-scale tests, who usually devote substantial energy to its calculation. A test’s reliability refers to its consistency. . Whenever a test is administered, the test user would like some assurance that the results could be replicated if the same individuals were tested again under similar circumstances. This desired consistency (or reproducibility) of test scores is called reliability. (Crocker and Algina, 1986: 105) RELIABILITY AND VALIDITY Reliability without validity would be similar to an archer consistently hitting the target in the same place but missing the bull's eye by a foot. The archer's aim is reliable because it is predictable but it is not accurate. The archer's aim never hits what it is expected to hit. The theory of reliability compares the reliability of a test of human characteristics with the reliability of measuring instruments in the physical sciences.The basic starting point for almost all theories of test reliability is the idea that test scores reflect the influence of two sorts of factors: 1. Factors that contribute to consistency: stable characteristics of the individual or the attribute that one is trying to measure 2. Factors that contribute to inconsistency: features of the individual or the situation that can affect test scores but have nothing to do with the attribute being measured Some of these inconsistencies include: Ø Temporary but general characteristics of the individual: health, fatigue, motivation, emotional strain Ø Temporary and specific characteristics of individual: comprehension of the specific test task, specific tricks or techniques of dealing with the particular test materials, fluctuations of memory, attention or accuracy Ø Aspects of the testing situation: freedom from distractions, clarity of instructions, interaction of personality, sex, or race of examiner Ø Chance factors: luck in selection of answers by sheer guessing, momentary distractions The goal of estimating reliability is to determine how much of the variability in test scores is due to errors in measurement and how much is due to variability in true scores. A true score is the replicable feature of the concept being measured. It is the part of the observed score that would recur across different measurement occasions in the absence of error. Errors of measurement are composed of both random error and systematic error. It represents the discrepancies between scores obtained on tests and the corresponding true scores. This conceptual breakdown is typically represented by the simple equation: Observed test score = true score + errors of measurement 'RELIABILITY IN TRUE SCORE THEORY ' True score is the exact measure of the test taker's true ability in the area being tested. With a perfect test, the observed score would be equal to the true score. However, there is no perfect test. According to true score theory, no one can know what the reliability of the test is unless one knows how much random error exists in the test. One cannot know how much error exists in the test unless one knows what the true score is. As a theoretical concept, the true score cannot be known. Therefore, what the reliability is can never be known with certainty. ''' ' 'THE THEORY OF RELIABILITY ' According to the theory of reliability, when a test is administered to a group of individuals, the observed variance in the distribution of scores should be due only to the true variance in the ability levels of the test takers. The degree that the two variances match is the reliability evaluation.. Reliability can also be expressed as the ratio of the variance of the true score to the variance of the observed score. In summary, according to reliability theory, reliability is equal to the ratio of the variance of the true score to the variance of the observed score. Calculating the ratio of the estimated variance of the true score to the variance of the observed score is the same as calculating the correlation between two observed scores. Therefore, the correlation of two repeated measures of the same test is accepted as an appropriate estimate of the reliability of the test. Types of Norm Reliability '' Reliability' of any research is the degree to which it gives an accurate score across a range of measurement. Inter-rater: Different people, same test. Test-retest: Same people, different times. Parallel-forms: Different people, same time, different test. Internal consistency: Different questions, same construct Inter-Rater Reliability When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. It can be used to calibrate people . Inter-rater reliability thus evaluates reliability across different people. Two major ways in which inter-rater reliability is used are: testing how similarly people categorize ''items ) how similarly people ''score ''items. Test-Retest Reliability An assessment or test of a person should give the same results whenever you apply the test. Test-retest reliability evaluates reliability across ''time. Parallel-Forms Reliability One problem with questions or assessments is knowing what questions are the best ones to ask. A way of discovering this is do two tests in parallel, using different questions. Parallel-forms reliability evaluates different questions and question sets that seek to assess the same construct. What is Split-Half Reliability? A test given and divided into halves and are scored separately, then the score of one half of test are compared to the score of the remaining half to test the reliability (Kaplan & Saccuzzo, 2001). Kuder-Richardson reliability or coefficient alpha The Kuder-Richardson reliability or coefficient alpha ''is relatively simple to do, being based on one administration of the test. It assesses inter-item consistency of test by looking at two error measures: Adequacy of content sampling Heterogeneity of domain being sampled Rkk = k / (k – 1(1 – Σσ2i/σ2t)) When asking questions in research, the purpose is to assess the response against a given construct or idea. Different questions that test the same construct should give consistent results. Internal consistency reliability evaluates individual questions in comparison with one another for their ability to give consistently appropriate results Improving the Reliability of Classroom Tests: Ø Write longer tests Ø Pay more attention to the careful construction of the test questions Ø Start planning the test and writing the items well ahead of the time the test is to be given Ø Write clear directions and use standard administrative procedures. Procedure to Increase Reliability: 1) decrease the ambiguity of the test items 2) increase the number of items per objective 3) provide clear test-taking instructions Threats to Test Reliability: (1) ''Item Sampling '' (2) ''Construction of the Items (3) Test administration (4) Scoring (5) Difficulty of the Test (6) Student Factors References: 1.Fulcher, G and Davidson, F. (2007) ''Language Testing and Assessment: An advanced resource book '' ''2.Bachman, L.F. (1990) Fundamental Consideration in Language Testing. :Oxford University Press. '' ''3.Brown, H.D. (2004) Language Assessment Principles and Classroom Practices. Longman '' ''4.Farhady, H.Dr. and Jafarpur, A. Dr. and Birjandi, P.Dr. (2004) Testing Language Skills From Theory to Practice. The Center for Studying Compiling '' ''University Books in Humanities (SAMT) '' '' '' '' ''